On the Quest for Perfect Load Balance in Loop-Based Parallel Computations

نویسنده

  • Rizos Sakellariou
چکیده

Loop structures are a potentially rich source of parallelism in programs written in a high-level programming language, such as Fortran. Parallelisation of loop structures, by assigning and executing di erent loop iterations to and on each processor of a parallel computer, may lead to dramatic improvements in performance. Parallelising compilers aim to exploit this potential by converting a sequential program into a semantically equivalent parallel form, by means of a sequence of appropriately selected transformations. In order to achieve this, one necessity is mapping schemes which distribute the computational work, embodied in the parallel loop, across the multiple processors as evenly as possible. Ideally, each processor is assigned exactly the same amount of computational work, in which case perfect load balance is achieved; otherwise, some load imbalance is said to exist. This thesis investigates the extent to which perfect load balance can be attained when parallelising members of the class of loop nests which contain bounds that are either constant or linear expressions involving the indices of the surrounding loops. First, an algorithm for counting the number of iterations of a given loop nest is developed. This is capable of handling symbolic variables; that is, variables whose value is not known at compile-time. The resulting, possibly symbolic, count can be used to provide estimates for the execution time of the loop nest. Using this algorithm as a basis for the quantitative evaluation of load imbalance, the main body of the thesis develops a compile-time load balancing strategy for mapping members of this class of loop nests. This strategy associates an appropriate mapping scheme with each loop nest depending on the amount of computational work contained within it. At the heart of the strategy, a connection with an old problem of Number Theory, the Prouhet-Tarry-Escott problem, is established. Finally, a comparative analysis of related mapping schemes is conducted. Experimental results on a virtual shared memory parallel computer, the KSR1, show that, in many circumstances, the strategy proposed in this thesis achieves better performance. 7 Declaration No portion of the work referred to in this thesis has been submitted in support of an application for another degree or quali cation of this or any other university or other institute of learning. 8 Copyright 1. Copyright in text of this thesis rests with the Author. Copies (by any process) either in full, or of extracts, may be made only in accordance with instructions given by the Author and lodged in the John Rylands University Library of Manchester. Details may be obtained from the Librarian. This page must form part of any such copies made. Further copies (by any process) of copies made in accordance with such instructions may not be made without the permission (in writing) of the Author. 2. The ownership of any intellectual property rights which may be described in this thesis is vested in the University of Manchester, subject to any prior agreement to the contrary, and may not be made available for use by third parties without the written permission of the University, which will prescribe the terms and conditions of any such agreement. Further information on the conditions under which disclosures and exploitation may take place is available from the Head of Department of Computer Science. 9 Stouc gone•c mou, Iwnnh kai El‘nh To my parents, Ioannis and Eleni 10 Acknowledgements This thesis could not have been completed without the help of a number of people who made it possible; it is a pleasure to acknowledge them. My supervisor, Professor John Gurd, has been an invaluable source of continuous support, advice and encouragement. I am particularly grateful to him for his patience and his rigorous attention when writing this thesis. This work owes much to all past and present members of the Centre for Novel Computing, who were always being prepared to discuss and provide answers to my questions. In particular, I would like to thank my o ce mates in room 2.126 of the Department of Computer Science. Mike O'Boyle, Gholam Hedayat, Zbigniew Chamski, and the numerous discussions I had with them in the last four years, contributed signi cantly to the ideas expressed in this thesis; I am grateful to Mike especially for his help, in various ways, during the last months this thesis was being written. Henry Okora Okoyo and Armando Fortuna contributed to the creation of a stimulating environment in which it was a pleasure to work. Special thanks are also due to Elena St ohr for constructive comments on an earlier draft of this thesis. During the years that research for this thesis was being undertaken, as well as in the years that led to this stage, there were many people who helped, in their own way, and to whom I am grateful. However, there are two persons whose sacri ces have been by far unparalleled; these are my parents, who, alongside my sister, have always been an inexhaustible source of support. This thesis is dedicated to them. Finally, I am indebted to the State Scholarships Foundation of Greece ('Idruma Kratik”n Upotrofi”n I.K.U.) for providing nancial support. This thesis was set using the LATEXdocument preparation system. The use ofMathematica for carrying out several of the computations presented in the text is also acknowledged. 11 poll ̈ t ̈ dein ̈ koŽd©n ˆnjr”pou dein“teron p‘lei; Sofokl~hc, >Antig“nh many wonders there be, but naught more wondrous than man. Sophocles, Antigone 12 Notation The notation used in the thesis is rather standard; for easy reference, the symbols used are listed below, along with a short explanation: bxc the greatest integer less than or equal to x. dxe the least integer greater than or equal to x. m j n m divides n, i.e. there exists an integer k such that n = mk, m;n integers. m n m does not divide n, i.e. there exists no integer k such that n = mk, m;n integers. gcd(m;n) The Greatest Common Divisor of m and n. [a; b] All the values of x (real or integer) such that a x b. max l i u(xi) The maximum value of xi for all integer values of i in [l; u]. sign(x) It returns 1, 0, 1, depending on whether x is negative, zero, or positive, respectively. ^ Logical and. _ Logical or. A \ B Set intersection. A [ B Set union. A number of loop-related terms dominate much of the thesis. Although these have been established in the literature, the reader who is unfamiliar may consult Section 1.4.2.2. Particular mention is made to the notion of a canonical loop nest, introduced in this thesis, which is explained by De nitions 4.1 and 4.2. Finally, throughout the thesis, the end of the proofs of theorems and lemmata is marked by QED, an abbreviation of the Latin phrase Quod Erat Demonstrandum (i.e. which was to be proved). The symbol 2 is used to mark the end of examples. 13 Chapter

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parleda: a Library for Parallel Processing in Computational Geometry Applications

ParLeda is a software library that provides the basic primitives needed for parallel implementation of computational geometry applications. It can also be used in implementing a parallel application that uses geometric data structures. The parallel model that we use is based on a new heterogeneous parallel model named HBSP, which is based on BSP and is introduced here. ParLeda uses two main lib...

متن کامل

Scheduling Data-Parallel Computations on Heterogeneous and Time-Shared Environments

This paper addresses the problem of load balancing data{parallel computations on heterogeneous and time-shared parallel computing environments. Load imbalance in these environments may be introduced by the diierent capacities of processors populating a computer, or by the sharing of the same computational resources among several users. A evenly partitioned code, which on a homogeneous system ru...

متن کامل

Analytical Study of Optical Bi-Stability of a Single-Bus Resonator Based on InGaAs Micro-Ring Array

In this paper, for the first time to our knowledge, we investigate the optical bi-stability in a compact parallel array of micro- ring resonators with 5μm radius, induced by optical nonlinearity. Due to the nature of perfect light confinement, resonance and accumulation process in a ring resonator, optical nonlinear effects, even at small optical power of a few milliwatts in this structure are ...

متن کامل

Load Sharing Control of Parallel Inverters with Uncertainty in the Output Filter Impedances for Islanding Operation of AC Micro-Grid

Parallel connection of inverter modules is a solution to increase reliability, efficiency and redundancy of inverters in Micro-Grid system. Proper load sharing among parallel inverters is a key point. The circulating current among the inverters can greatly reduce the efficiency or even cause instability of the system. In this paper, a control strategy for improving the load sharing performance ...

متن کامل

Robust Controller Design Based-on Aerodynamic Load Simulator Identification Driven by PMSM for Hardware-in-the-Loop Simulations

Aerodynamic load simulators generate the required time varying load to test the actuator’s performance in the laboratory. Electric Load Simulator (ELS) as one of variety of the dynamic load simulators should follows the rotation of the Under Test Actuator (UTA) and applies the desired torque to UTA’s rotor at the same time. In such a situation, a very large torque is imposed to the ELS from the...

متن کامل

Dynamic Load Carrying Capacity of Mobile-Base Flexible-Link Manipulators: Feedback Linearization Control Approach

This paper focuses on the effects of closed- control on the calculation of the dynamic load carrying capacity (DLCC) for mobile-base flexible-link manipulators. In previously proposed methods in the literature of DLCC calculation in flexible robots, an open-loop control scheme is assumed, whereas in reality, robot control is achieved via closed loop approaches which could render the calculated ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998